Transcription of Endogenous Retroviruses: Broad and Precise Mechanisms of Control

Endogenous retroviruses (ERVs) are the remnants of retroviral germline infections and are highly abundant in the genomes of vertebrates. At one time considered to be nothing more than inert ‘junk’ within genomes, ERVs have been tolerated within host genomes over vast timescales, and their study continues to reveal complex co-evolutionary histories within their respective host species. For example, multiple instances have been characterized of ERVs having been ‘borrowed’ for normal physiology, from single copies to ones involved in various regulatory networks such as innate immunity and during early development. Within the cell, the accessibility of ERVs is normally tightly controlled by epigenetic mechanisms such as DNA methylation or histone modifications. However, these silencing mechanisms of ERVs are reversible, and epigenetic alterations to the chromatin landscape can thus lead to their aberrant expression, as is observed in abnormal cellular environments such as in tumors. In this review, we focus on ERV transcriptional control and draw parallels and distinctions concerning the loss of regulation in disease, as well as their precise regulation in early development.


Introduction
Retroelements comprise a major class of transposable elements (TEs) that are characterized by mobilization involving the reverse transcription of an RNA intermediate transcribed from an existing element [1].Reverse transcription of the intermediate results in a dsDNA that is then reintroduced into the genome at a unique position along a chromosome by integration.As the original element is left intact, this process is commonly referred to as a 'copy and paste' mechanism of amplification.Retroelements are further classified based on the presence or absence of long terminal repeats (LTRs) and are referred to as LTR and non-LTR retroelements (Figures 1A and 1B, respectively) [1].Under canonical conditions, non-LTR element spread is restricted to the cell in which they are mobilized, whereas LTR retroelements originate from the germline infection of exogenous retroviruses, and therefore, their ability to spread involves leaving the cell [2].To avoid negative effects that could arise if retroelements were expressed, host cells have evolved several mechanisms to tightly control their transcription [3].The ability to control these elements permitted their functional exaptation or 'repurposing' within the host genome, and retroelements have been recently characterized for their use in regulatory networks, such as innate immunity and during embryogenesis [4,5].However, the deregulation of TEs is commonly observed in cancers and other diseases and can negatively impact the expression of local genes or promote oncogenic effects through various mechanisms [6,7].In this review, we focus on recent advances from studies of ERVs concerning their transcriptional regulation in health and disease.

Endogenous Retroviruses
Retroviruses are positive-sense single-stranded RNA (ssRNA) viruses that have been infecting mammals and other vertebrates for hundreds of millions of years [8][9][10][11].The retrovirus replication cycle is unique due to the hallmark requirement that, to establish a productive infection, the viral ssRNAs must be reverse transcribed to produce a doublestranded DNA (dsDNA) molecule that is then permanently integrated into the host cell's genome [12].Following integration, there is no mechanism of excision, and consequently, the integrated form is stably inherited as a genetic component of the cell and referred to as a provirus [12].Due to the integration of the reverse-transcribed dsDNA molecule, infection of the germline (e.g., sperm or egg cells or during very early embryogenesis) leads to a provirus that has the potential to be transmitted vertically to offspring in a Mendelian fashion, referred to as an endogenous retrovirus (ERV) [2,13].
At the time of integration, a canonical full-length ERV retains the characteristic properties of a replication competent integrated provirus [12].Structurally, the ERV is comprised of a long directly repeated sequence located at either terminus, together comprising the LTRs (5′ LTR and 3′ LTR), that flank an internal segment, including protein-coding genes required for replication (Figure 1A).Minimally, these comprise gag, pro/pol, and env [14].Briefly, gag encodes structural proteins; pro/pol the enzymatic functions, including protease, reverse transcriptase, and integrase; and env the envelope surface glycoprotein that mediates receptor recognition and membrane fusion [14].The internal 5′ untranslated region (UTR) upstream of gag houses a primer binding site (PBS) of sequence that is complementary to the cellular tRNA used to prime reverse transcription.Once integrated, the LTRs provide regulatory functions for the transcription and processing of spliced, as well as full-length, mRNAs that will ultimately be used as templates for protein synthesis or

Endogenous Retroviruses
Retroviruses are positive-sense single-stranded RNA (ssRNA) viruses that have been infecting mammals and other vertebrates for hundreds of millions of years [8][9][10][11].The retrovirus replication cycle is unique due to the hallmark requirement that, to establish a productive infection, the viral ssRNAs must be reverse transcribed to produce a doublestranded DNA (dsDNA) molecule that is then permanently integrated into the host cell's genome [12].Following integration, there is no mechanism of excision, and consequently, the integrated form is stably inherited as a genetic component of the cell and referred to as a provirus [12].Due to the integration of the reverse-transcribed dsDNA molecule, infection of the germline (e.g., sperm or egg cells or during very early embryogenesis) leads to a provirus that has the potential to be transmitted vertically to offspring in a Mendelian fashion, referred to as an endogenous retrovirus (ERV) [2,13].
At the time of integration, a canonical full-length ERV retains the characteristic properties of a replication competent integrated provirus [12].Structurally, the ERV is comprised of a long directly repeated sequence located at either terminus, together representing the LTRs (5 ′ LTR and 3 ′ LTR), that flank an internal segment, including protein-coding genes required for replication (Figure 1A).Minimally, these include gag, pro/pol, and env [14].Briefly, gag encodes structural proteins; pro/pol the enzymatic functions, including protease, reverse transcriptase, and integrase; and env the envelope surface glycoprotein that mediates receptor recognition and membrane fusion [14].The internal 5 ′ untranslated region (UTR) upstream of gag houses a primer binding site (PBS) of sequence that is complementary to the cellular tRNA used to prime reverse transcription.Once integrated, the LTRs provide regulatory functions for the transcription and processing of spliced, as well as full-length, mRNAs that will ultimately be used as templates for protein synthesis or incorporated into budding virions [14].In the absence of selection, mutations accumulate randomly at the neutral rate of the host, one that is markedly slowed from its exogenous replication [15].Thus, ERVs provide a fossilized record of once (or still) infectious retroviral lineages.The majority of ERVs are ancient and have lost the ability to leave the cell due to accumulated mutations resulting in their decay [15].However, some are observed to maintain intact genes due to benefits offered to the host or remain transcriptionally regulated despite replication incompetence.Several species' genomes harbor ERV lineages with evidence of recent or ongoing germline invasion, as inferred by the presence of new copies (Figure 2A).These 'young' ERVs tend to bear close sequence homology to their exogenous source and may retain transcriptional activities or possess one or more open reading frames (ORFs).Recent studies have drawn attention to such lineages in felines [16,17], wolf-like canids [18,19], mule deer [20,21], bovines [22], and koalas [23][24][25].
incorporated into budding virions [14].In the absence of selection, mutations accumulate randomly at the neutral rate of the host, one that is markedly slowed from its exogenous replication [15].Thus, ERVs provide a fossilized record of once (or still) infectious retroviral lineages.The majority of ERVs are ancient and have lost the ability to leave the cell due to accumulated mutations resulting in their decay [15].However, some are observed to maintain intact genes due to benefits offered to the host or remain transcriptionally regulated despite replication incompetence.Several species' genomes harbor ERV lineages with evidence of recent or ongoing germline invasion, as inferred by the presence of new copies (Figure 2A).These 'young' ERVs tend to bear close sequence homology to their exogenous source and may retain transcriptional activities or possess one or more open reading frames (ORFs).Recent studies have drawn attention to such lineages in felines [16,17], wolf-like canids [18,19], mule deer [20,21], bovines [22], and koalas [23][24][25].Due to the mechanism of reverse transcription, the 5′ and 3′ LTR are identical in sequence at the time of integration and subsequently diverge [12].Proviral LTRs are observed to undergo recombinational deletion, leading to the formation of a solitary LTR (solo-LTR) and resultant loss of the internal coding portion (Figure 2B).Therefore, a potential of three alleles may be present for a given insertion: a full-length provirus, solo-LTR, or (prior to fixation) the unoccupied site (Figure 2B) [26][27][28][29].In general, solo-LTR formation tends to favor identical LTRs and thus appears to be inversely correlated with age [27].However, deviations from this trend are observed, hinting that the pressures leading to solo-LTR formation are complex and likely to involve factors aside from sequence identity between the LTRs [28,[30][31][32][33].For a solo-LTR generated from identical pairs, the full nucleotide sequence should, in principle, be preserved, and the recombinant allele likewise retain the same potential for function.As with other repetitive elements, ERVs provide sources of genomic templates that can seed larger chromosomal rearrangements [34,35] or facilitate ectopic (non-allelic) gene conversion, resulting in the transfer of sequence information from highly similar but non-allelic ERV loci, thus influencing conversion 'hotspots' [26,34,36].Well-characterized ERV-related hotspots are present within the human male-specific Y region (e.g., ERV1 LTR2 and LTR24 groups) [37].ERV genes can also be subject to conversion, for example, the maintenance of the internal gene sequence as evidenced for ERV-V env (e.g., preservation of ENVV1 in humans and simian Due to the mechanism of reverse transcription, the 5 ′ and 3 ′ LTR are identical in sequence at the time of integration and subsequently diverge [12].Proviral LTRs are observed to undergo recombinational deletion, leading to the formation of a solitary LTR (solo-LTR) and resultant loss of the internal coding portion (Figure 2B).Therefore, a potential of three alleles may be present for a given insertion: a full-length provirus, solo-LTR, or (prior to fixation) the unoccupied site (Figure 2B) [26][27][28][29].In general, solo-LTR formation tends to favor identical LTRs and thus appears to be inversely correlated with age [27].However, deviations from this trend are observed, hinting that the pressures leading to solo-LTR formation are complex and likely to involve factors aside from sequence identity between the LTRs [28,[30][31][32][33].For a solo-LTR generated from identical pairs, the full nucleotide sequence should, in principle, be preserved, and the recombinant allele likewise retain the same potential for function.As with other repetitive elements, ERVs provide sources of genomic templates that can seed larger chromosomal rearrangements [34,35] or facilitate ectopic (non-allelic) gene conversion, resulting in the transfer of sequence information from highly similar but non-allelic ERV loci, thus influencing conversion 'hotspots' [26,34,36].Well-characterized ERV-related hotspots are present within the human male-specific Y region (e.g., ERV1 LTR2 and LTR24 groups) [37].ERV genes can also be subject to conversion, for example, the maintenance of the internal gene sequence as evidenced for ERV-V env (e.g., preservation of ENVV1 in humans and simian primates) [38], as well as ERV-V gag (involving gagV1 and gagV3 in non-ape simian primates) [39].
Germline colonization followed by vertical passage has been a successful strategy for retroviruses [2,15].For example, ERVs recognizably account for, respectively, 3.5 and 6% of the domestic dog and cat and 8 and 10% of the human and mouse reference genomes [40][41][42][43].Upon their discovery, these elements were rightfully recognized as 'viral fossils' but often referred to as 'junk DNA' and widely assumed as inert [44].Indeed, the repertoire of ERVs within a genome can be viewed as a limited but accessible record of once-infectious viruses ranging from the ancient to those still endogenizing a species [45].Within this fossil record, the molecular signatures of past virus-host interactions may be gleaned, as well as subsequent co-evolutionary patterns between the two [2,45,46].To say there are a growing number of exceptions to the 'junk' in our genomes is an understatement.

ERV Nomenclature
Traditionally, ERVs have been principally classified by sequence homology of the pol gene with exogenous Retroviridae [47,48], which comprises two subfamilies (Orthoretrovirinae and Spumaretrovirinae) and 11 genera, according to the 2021 International Committee for Virus Taxonomy [49].This classification scheme is further designated by one of three conventional classes: class I elements are similar to gamma-and epsilon-like retroviruses; class II are similar to alpha-, beta-, and delta-like retroviruses; class III are similar to the spuma-like retroviruses [50].The nomenclature can be further adapted to notate ERVs by species presence using one or two letters (e.g., human ERV, HERV; Canis familiaris, CfERV), which may be accompanied by specification of the tRNA inferred to prime reverse transcription.For example, HERV-K members (class II, beta-like) have PBS sequence similarity to a tRNA Lys [2].These qualifiers are integrated into the RepBase classification of ERV/LTRs [51], which account for genomic presence by species [51,52].Regarding ERVs, this classification is delineated by 'superfamily' (ERV1, ERV2, and ERV3; corresponding to class I, II, and III described above), followed by group, associated proviral sequence ('-int'), and associated LTR [51,52].For example, all human class II elements are beta-like; the youngest HERVs thus belong to ERV2 HML-2 HERV-K-int LTR5Hs [51].Further discrimination of ERV loci by chromosomal location is by cytoband (e.g., HERV-K 11p15.4)[53,54].A proposal of nomenclature using a systematic approach incorporates element type, locus-specific information, and species annotation as a system to account for orthologs between species, as well as insertionally polymorphic loci [47].Given the growing number of identified ERVs over time [3], the challenges of adopting such a revised if common system are obvious.
One of the most abundant ERVs in human genomes is the ERV1 gamma-like HERV-H that entered the germline ~40-25 mya prior to the New/Old World monkey split and then was amplified mostly in OWMs [70].Subsequent waves of propagation over timeframes ~20-9 mya and ~10-4 mya drove expansions of env-deficient copies [32,[70][71][72].As is reflected in RepBase, HERV-H LTRs are traditionally classified into four subgroups (LTR7, 7b, 7c, and 7y); their recent phylogenetic refinement identifies eight previously unrecognized ones, the youngest from the proposed classifications of 7up1/2, 7u, and 7y copies (reported in Dfam) [72].The refined analysis divulges a dynamic recombination-driven history of HERV-H LTRs involving the gain, loss, and exchange of cis-regulatory functions contributing to subgroup-specific functional motifs [72].HERV-H is notable for a shift in allelic presence from most ERV groups, in which proviruses account for >60% of all loci [28,31,33].Though an explanation is not entirely clear, this shift in provirus presence hints at selective constraints of internal sequence properties [30][31][32].

Regulatory Features of ERVs
ERVs exert dramatic influence on the transcriptional landscape as well as the evolutionary shaping of the host genome.Many members of ERV lineages have retained biological properties and have been 'borrowed' for a benefit offered to the host, in which they are regulated (Figure 3A).In particular, the LTRs possess regulatory features for transcription by cellular machinery and can therefore act as promoters or long-range enhancers of host genes [13].Likewise, host species possess repressive mechanisms to recognize ERVs and exert control over their activation [3].Importantly, the potential of an ERV to be expressed is not limited to LTR-driven transcriptional mechanisms.LTRs may also be embedded within transcripts by readthrough from the transcription of alternate promoters of conventional genes (or even other LTRs) or can be spliced into mRNAs along with the functional sequence (Figure 3B).lncRNAs too were previously thought to have no biological function, and growing evidence implicates the functional relevance of lncRNAs, including those associated with ERVs [81][82][83][84][85]. Owing to these collective properties, ERVs are now recognized as a major force of regulatory innovation [5,29,86].
expressed is not limited to LTR-driven transcriptional mechanisms.LTRs may also be embedded within transcripts by readthrough from the transcription of alternate promoters of conventional genes (or even other LTRs) or can be spliced into mRNAs along with the functional sequence (Figure 3B).lncRNAs too were previously thought to have no biological function, and growing evidence implicates the functional relevance of lncRNAs, including those associated with ERVs [81][82][83][84][85]. Owing to these collective properties, ERVs are now recognized as a major force of regulatory innovation [5,29,86].

.ERV LTRs Are Enriched in Transcription Factor Binding Sites
All retroviral LTRs, and, hence, those sourced from an ERV, possess a modular structure of unique segments U3 and U5 that are separated by a repeat segment R (5′ U3-R-U5 3′) (Figure 3A).Within these segments are regulatory cis-acting sequences corresponding to transcription factor (TF) binding sites (TFBSs) and the RNA Pol II TATA-box-like core promoter (usually in the U3) and a polyadenylation signal (usually in the R) [87].Of note, the presence, placement, and sequence of these motifs can vary widely across ERV lineages [88,89].
LTRs are highly enriched for TFBSs or combinations thereof, implicating ERV propagation results in the deposition of not only of canonical promoters but also of directly associated cis-acting regulatory sequences.Curation of a TFBS presence within ERVs implies the functional evolution of such sites.For example, an analysis of ENCODE TFBS profiles from 13 human primary cell lines found roughly 15% overlap with LTRs, of which there was 8% overlap within 10kb of a predicted gene transcription start site (TSS) [90].An analysis of ENCODE and Roadmap Epigenomics ChIPseq data for 97 TFs identified 794,972 ERV-encoded TFBSs over the human genome [86].These can be parsed into clusters involved in shared regulatory functions, as inferred by the presence of common TFBSs (i.e., HERV/LTR shared regulatory element or HSRE).In this study, the authors identified eight such HSREs and their differential presence over ERV LTR groups, for example, the pluripotency cluster TFBSs Sox2, OCT4, and NANOG; embryonic endoderm cluster TFs GATA4/6, Sox17, and FOXA1/2; B-lymphocyte cluster TFs PAX5 and PBX3; and the chromatin architecture TF CTCF; many general TFBSs are present as well [86].Importantly, a HSRE presence is not fully consistent with ERV phylogenetic classifications, and HSREs are instead differentially enriched within LTRs from distinct groups [86].Younger ERV

ERV LTRs Are Enriched in Transcription Factor Binding Sites
All retroviral LTRs, and, hence, those sourced from an ERV, possess a modular structure of unique segments U3 and U5 that are separated by a repeat segment R (5 ′ U3-R-U5 3 ′ ) (Figure 3A).Within these segments are regulatory cis-acting sequences corresponding to transcription factor (TF) binding sites (TFBSs) and the RNA Pol II TATA-box-like core promoter (usually in the U3) and a polyadenylation signal (usually in the R) [87].Of note, the presence, placement, and sequence of these motifs can vary widely across ERV lineages [88,89].
LTRs are highly enriched for TFBSs or combinations thereof, implicating ERV propagation results in the deposition of not only of canonical promoters but also of directly associated cis-acting regulatory sequences.Curation of a TFBS presence within ERVs implies the functional evolution of such sites.For example, an analysis of ENCODE TFBS profiles from 13 human primary cell lines found roughly 15% overlap with LTRs, of which there was 8% overlap within 10kb of a predicted gene transcription start site (TSS) [90].An analysis of ENCODE and Roadmap Epigenomics ChIPseq data for 97 TFs identified 794,972 ERV-encoded TFBSs over the human genome [86].These can be parsed into clusters involved in shared regulatory functions, as inferred by the presence of common TFBSs (i.e., HERV/LTR shared regulatory element or HSRE).In this study, the authors identified eight such HSREs and their differential presence over ERV LTR groups, for example, the pluripotency cluster TFBSs Sox2, OCT4, and NANOG; embryonic endoderm cluster TFs GATA4/6, Sox17, and FOXA1/2; B-lymphocyte cluster TFs PAX5 and PBX3; and the chromatin architecture TF CTCF; many general TFBSs are present as well [86].Importantly, a HSRE presence is not fully consistent with ERV phylogenetic classifications, and HSREs are instead differentially enriched within LTRs from distinct groups [86].Younger ERV groups (e.g., LTR7 members, LTR5Hs, LTR6A, and MER11C) tend to have more pluripotent TFBSs; these TFBSs are rarely observed in exogenous viruses [86].Generally, young LTRs tend to be CpG-rich, and CpG-rich LTRs tend to be bound by transcription initiation-associated TFs than CpG-depleted ones [91].Long term, CpG sites are inevitably lost due to deamination and other mutations [91].LTRs from older groups are overrepresented in enhancer regions compared to younger groups, suggesting the likelihood of an element to serve a regulatory function increases with age [91].Based on data of chromatin accessibility and modification, a recent analysis of ENCODE data identified >924,000 candidate cis-regulatory elements (cCREs) in the human genome [92], of which 10.2% are primate-specific based on a comparison of 241 genomes of placental mammals of the Zoonomia Project [93]; 90% of these cCREs overlap TEs, of which 34.9% are within LTRs [92,93].Thus, LTRs may account for around one-third (and TEs may account for nearly all) of primate-specific cis-regulatory elements.A subsequent study of 367 TFs identified ~15.6 million TFBSs using ChIPseq data of 785 cell and tissue types, of which 24.5% are primate-specific; 86.1% of these TFBSs overlap TEs, of which 22.4% are in LTRs [93].Thus, a significant potential for regulatory innovation in primates appears to lie in ERVs and other TEs.It is important to remember that mutations post-insertion may impact the functional potential of LTR use, for example, by altering TFBS motifs or methylation sites.Such changes are subject to drift or other modes of selection and thus may vary in presence among individuals within a population.A population genetics approach is offered from the analysis of unique TFBSs present in the 5 ′ LTRs of HERV-K proviruses using the 1000 Genomes Project data [94].

LTRs Provide a Source of Modularity to Gene Regulation
Given their intrinsic properties, LTRs have indeed been utilized in mammalian evolution for transcriptional promoter and enhancer functions [95].Additionally, the tendency to recombine neatly to the solo-LTR form introduces essentially finished promoters in modular form to new genomic locales.For example, LINE-1 retrotransposition is also driven by RNA Pol II from a 5 ′ internal core promoter; however, most new LINE-1 insertions are 5 ′ truncated and therefore incapable of conferring similar cis-regulatory functions [96,97].Over evolutionary scales, propagation waves of lineage-specific ERVs thus dispensed numerous modules of functional potential that have fueled innovation in the regulation of genes or gene networks.Recent developments in 'omics'-based techniques enable the direct interrogation of genetic and epigenetic alterations throughout a given cell or tissue type of interest.Importantly, these studies continue to reveal a history of virus-host co-evolution that is deeply intertwined and elegantly complex.The mechanisms of ERV-mediated regulation of transcriptional networks in immune defense were exemplified in a 2016 landmark study by Chuong et al. [4].In that study, the authors showed that the propagation of lineage-specific γ-like ERVs (e.g., ERV1 MER41s) dispensed a reservoir of IFNγ-inducible LTR enhancers of multiple immune-related genes throughout the genome [4].MER41Bs were discovered to be enriched for STAT1 binding, and one was identified as solely responsible for driving the expression of AIM2, a cytosolic foreign DNA sensor that activates the inflammatory response [4].In addition to innate immunity, the regulatory exaptation of ERVs has been documented in processes including embryogenesis [98], placentation [99], and the evolution of regulatory differences between species [100,101].Conversely, the activation of normally repressed ERVs can affect cancer initiation and progression in a unique phenomenon referred to as 'onco-exaptation', for example, by providing promoters of proto-oncogenes or of alternate oncogenic isoforms [6,7,102-106].

ERVs Are Regulated by Epigenetic Control
The necessity of strict ERV regulation to avoid the aberrant activation of local genes and counter the threat of insertional mutagenesis is obvious.As will be discussed later in Section 6, many ERVs are activated in very early cellular development, in which the genome is hypomethylated and accessible; these ERVs are rapidly silenced during differentiation and, in principle, remain tightly regulated in normal somatic tissues [3].Silencing is enforced via multiple mechanisms, including histone modifications and DNA methylation, leading to a repressive heterochromatic state in what has been referred to as an 'epigenetic corset' [107].
In both mice and humans, targeting the ERV PBS for silencing is a potent strategy that is principally facilitated by KRAB-ZFPs (KZFPs) (Figure 3A).Functionally, members of the KZFPs contain at least one N-terminal Krüppel-associated box (KRAB, a motif related to the ~620 my old PRDM9/Meisetz, a determinant of recombination hotspots in meiosis [108,109]) and a C-terminal array of Cys 2 -His 2 (C2H2) DNA-binding zinc-finger protein (ZFP, or ZNF) domains [110].During silencing, the ZFP binding to an ERV recruits the co-repressor and 'master regulator' of canonical silencing TRIM28 (or KAP1) to bind the KRAB domain.This complex serves to scaffold heterochromatin-inducing factors as the H3K9 methyltransferases (e.g., SETDB1 and SUV39h), deacetylase complexes (e.g., NuRD), and HP1 to exert potent repression [106].This manner of direct KZFP repression is bypassed for solo-LTRs, perhaps providing a selective context for solo-LTR formation or exaptation for tissue-specific regulation [29].Sumoylation of TRIM28 or the actions of other chromatin remodeling factors enhances its localization to ERVs [111].TRIM28 repression can act as a methylation 'hub' that can promote heterochromatin spreading to the surrounding genome, as facilitated by HP1 recruitment of SETDB1, as well as other H3K9-specific methyltransferases [3,110].The HUSH complex recruits the chromatin modeler MORC2 and SETDB1 for H3K9me3 deposition; it represses HIV-1, as well as young ERVs and LINE elements [112].KZFPs involved in ERV silencing also include H3K9me3-independent marks [113].The deposition of repressive histone marks targets sites for rapid and stable de novo CpG DNA methylation by DNMT1, DNMT3A, and DNMT3B, generally considered to serve as an epigenetic 'switch' to maintain LTR silencing in differentiated tissues [3].A general correlation of element age and methylation status indicates younger (i.e., CpG-rich) ERVs tend to be DNA methylated and, thus, more susceptible to reactivation by DNA methylation inhibitors (DNMTis), a phenotype that is synergistically enhanced by the knockdown of H3K9 methyltransferases (HMTs, e.g., SETDB1, SUV39h, or EZH2), whereas ones of an intermediate age tend to bear repressive histone marks, particularly H3K9me3, and are more sensitive to the knockdown of HMTs [114].Most of the oldest LTRs (i.e., CpGpoor, e.g., older ERV-L, Gypsy elements) appear susceptible to neither DNMTis nor the knockdown of HMTs, indicating their transcriptional inactivation due to loss-of-function mutations [114].However, as will be discussed, it is noteworthy that ERV-L-associated transcripts are observed in many human tumors, as well as during embryogenesis, and therefore such loss-of-function does not appear to generally apply to ERV-L group-wide.The susceptibilities of ERVs to DNMTis or HMTs differ between cell lines, which implies that differential expression resulting from deregulation of these pathways is likely to be reflected in tissues [114].
The KZFPs are notable as the largest family of ZFP transcriptional regulators in humans and mice and emerged in the Sarcopterygian ancestor of tetrapods, lung fish, and coelacanths ~420 mya [3,110].Of note, its emergence follows the phylogenetically supported marine origin of the oldest known ERVs, of the class I spuma-like foamy retroviruses, >450 mya around the origin of jawed vertebrates [8].Later in eutherians, as waves of ERVs propagated ancestral germlines, KZFPs rapidly expanded and diversified in response, resulting in respective species' copy numbers in the hundreds, with evidence of selection at the C2H2-binding domains [115,116].Most species analyzed have 200-400 copies; mice have nearly 700 [115].Humans possess at least 378 KZFPs; over one-third are the products of recent duplications and restricted to primates [109], and over two-thirds have a TE as the primary target [117].KZFPs also tend to be of evolutionarily similar ages to the ERVs they silence, with the youngest possessing the highest affinities for TRIM28 [116].On the other hand, nearly all ancient KZFPs are inefficient recruiters of TRIM28 but appear to be selectively constrained, suggesting alternate functions [117].Considering the genome-wide TFBS presence in humans, motifs corresponding to KZFP-binding sites have the highest enrichments in ERVs (as well as other TEs) [93].Among outliers of the most TFBSs overlapping ERVs [93] are KZFPs implicated in H3K9me3-mediated silencing (ZNF586 and ZNF680), as well as H3K9me3-independent LTR silencing (ZNF329 and ZNF331) during early development [113].ZNF350 (or ZBRK1), ZNF418, and ZNF134 are also identified [93].
KZFP expansion has been suggested as a host mechanism to prevent ERV spread as part of an evolutionary 'arms race', in which the genetic escape of KZFP-repressive binding of an ERV selects for emergent altered KZFPs and cycles back and forth [115].However, particularly in the case of ERVs, the KZFPs' targets comprise a vast majority (and perhaps all) of elements technically no longer capable of infection -but that nonetheless retain the ability to be transcriptionally used if regulated.ERV/KZFP interactions are widely implicated in establishing species-specific networks in early development, and many KZFP sites are bound by tissue-specific TFs and display characteristics of enhancers at later stages and in adult tissues [115].For example, the primate-specific KZFPs ZNF417 and ZNF587 repress HERV-K members in embryonic stem cells and later maintain control of the ERVs in the developing and adult human brain [118].Alterations of distinct KZFP/TE profiles are observed during brain development, in which they serve as alternate promoters of neurogenesis-specific genes [119].Thus, an arms race alone is insufficient to explain the selection and maintenance of KZFPs [115].Alternatively, the regulatory use of ERVs by KZFPs is proposed to promote their domestication and drive key aspects of species evolution and transcriptional nrtworks [115,116,120].

ERV Silencing Mechanisms Are Reversible
The loss of tight epigenetic control likewise features the disruption of ERV/LTR regulation normally silenced to promote genomic stability, which is associated with several aberrant pathologies [7,46,121,122].Extensive chromatin remodeling occurs during malignant transformation, resulting in the redistribution of DNA methylation across the genome and accompanied accessibility of ERVs and other retroelements [123,124].Hypomethylation is a hallmark characteristic of tumors and is recapitulated in cell models of cancer [125,126].For example, constitutive signaling by Ras oncogenic overexpression leads to hypomethylation in a variety of cellular models of transformation, and while minimally expressed in hTERT immortalized cells, ERVs are highly transcribed in Ras-transformed cells [94,[127][128][129]. Loss of repressive histone marks is accompanied by the aberrant expression of ERVs [123].As discussed in Section 5, the alteration of both epigenetic properties contributing to expressed ERVs (and the consequences of their expression) has been of increasing interest to the field regarding tumor immunogenicity and immunotherapy [130][131][132][133]. Importantly, beyond a loss of repressive silencing, relevant LTR-specific changes alter TFBSs and therefore the potential for silencing, as well as transcriptional use of those LTRs [94].The properties contributing to ERV expression thus converge on themes regarding direct LTR regulation (i.e., TSS in the LTR) that are dependent on (i) the differential access of LTRs as promoters given a particular cell state, (ii) the differential presence of TFs specific to accessible LTRs, and (iii) underlying genetic variations that are intrinsic to the LTRs themselves.The silencing of most ERVs implies their expression is intrinsically tied to their accessibility within chromatin, as well as the ability to be recognized.Given observations of differential ERV activation upon treatments with DNMTis (resulting in the tendency of 'younger' age ERVs to be expressed) or HMT inhibitors (expression of 'intermediate' age ERVs) [114], the prediction can be made that the internal inclusion of ERVs within transcripts may tend to originate from passive transcriptional effects, particularly regarding older integrants.

ERV Expression Is Associated with Human Disease
The discovery of 'RNA tumor virus'-like sequences in human DNA sparked decades of research seeking connections to cancer [134,135].The sequencing of the human genome, and, later, whole genomes of individuals, expedited the identification and characterization of a multitude of ERVs [3].ERV expression in the form of elevated mRNAs and ERV-encoded proteins is now known to occur in tumors and cell lines that model tumors and other environments.For example, transcripts of HERV-H, HERV-K, HERV-F, HERV-R, and HERV-S have all been observed in various cancer cell lines [136].HERV-K HML-2 expression is correlated with cancers, including breast cancer, ovarian cancer, germ cell tumors, prostate cancer, melanoma, lung cancer, lymphoma, and others [2,7,137,138].HML-2 LTR activation can aberrantly regulate nearby genes associated with breast cancer [139].HERV-W expression is correlated with multiple sclerosis (MS), bipolar disorder, and schizophrenia [140][141][142].HERV-H transcripts are significantly elevated in head and neck cancers, and HERV-E and HERV-K HML-6 are significantly downregulated in the same samples [143].HERV-H drives many lncRNAs associated with various cancers, such as teratocarcinoma, bladder carcinoma, testicular tumors, and others [7].ERV products display oncogenic properties, for example, the HERV-K proteins Rec and Np9 (respectively, from spliced mRNAs from type II and type I HML-2 proviruses) [144,145].The Env proteins of HERV-K, HERV-H, and others possess immunosuppressive properties, suggesting an ability to modulate the immune response [146,147], as well as potential vaccine targets [148].HERV-K Env can induce TFs in pathways associated with oncogenic transformation [149], as well as elicit cytokine release [150].HERV-W Env has been identified in neural plaques of MS patients and contributes to the cellular damage of axons in MS [151,152] as well as cell-cell fusion in some cancers [153,154].This Env has also been shown to induce IFN-ß innate immune signaling, leading to neuronal apoptosis in early-onset schizophrenia [155].Collectively, these and other similar observations continue to motivate research seeking to determine the scope of ERV involvement in disease, with obvious interest in establishing meaningful links to phenotypes.It is important to keep in mind that the deregulation of other retroelement types (e.g., LINE and SINE; Figure 1B) can drive aberrant phenotypes, including oncogenic mutagenesis [156].Also of importance, ERVs are expressed in healthy tissues in humans and animal models [19,[157][158][159][160].

ERVs Are Broadly Expressed in Various Cell Types
Within the past decade, the sequencing of whole transcriptomes facilitated the discovery that ERVs are expressed in every examined tissue and cell line [160].These findings beg questions of which ERVs are expressed and in which cell types.Though earlier studies mostly focused on members of particular ERV groups (e.g., HERV-K and HERV-W) or were limited to reported expressed ERVs according to broad classifications (e.g., 'ERV1' and 'ERV-L'), it is now understood that there is a high degree of heterogeneity of expressed ERVs that differ vastly in representation by cell type [130,137,158].In fact, thousands of transcribed ERVs are observed.Analysis of GTEx RNAseq data across normal tissues suggests some 13,889 ERVs are expressed, contributing to 0.19-1.9% of polyA RNAs across 42 tissue types [158].Such targeted approaches to identify individually expressed ERVs also pinpoint exact expressed loci in cancers.For example, an analysis of prostate, breast, and colon cancer TCGA RNAseq identifies numerous differentially expressed ERV loci, and the top up-and downregulated loci differ strikingly in all three cancer types (two exceptions are the upregulated HERVs at 19q13.12a in breast and prostate tumors and HERV-L at 8q24.3d in breast and colon tumors) [161].Though the significance is not clear, the two top upregulated prostate cancer ERVs are situated in a chr22 region that has been linked to chromosomal rearrangements HERV-K11 LTR5Hs 22q11.21 and HERV-K HML-2 LTR5B 22q11.23 [161].This latter provirus is notable for control by a ~550 bp upstream solo-LTR5H, which has been characterized to drive the spliced lncRNA of LTR5Hs-B22q11.23,PCAT14, a prostate cancer biomarker of unknown function [162,163].A recent study revealed the solo-LTR possesses nearly 50 TFBSs (nearly half of which correspond to ZNF-binding motifs) that are absent from related LTR5H members [157].The unique TFBSs include a PRDM9 motif [157]; normally solely restricted to germ cells, PRDM9 is aberrantly expressed in some cancers, including prostate, and structural variant breakpoints frequently neighbor the TFBS motif [164].Though speculative, the LTR has been implicated in an oncogenic translocation in the form of an overexpressed LTR_Hs-B-ETV1 fusion transcript in a prostate tumor of an ETV1-truncated variant [165].Recent studies have taken further advantage of RNAseq to infer ERV-sourced chimeric transcripts (i.e., possessing the ERV-derived sequence, as well as exonic sequence, of a conventional gene) (Figure 3B) as an indication of cis-regulatory transcriptional activities associated with ERV expression [127,161,166].
The findings revealed expressed ERVs in HRAS-transformed cells contribute to transcripts associated with standalone LTRs (i.e., ERV-only sequence with apparent TSSs in the LTR), as well as ones predicted to be LTR-initiated chimeras of genes or lncRNAs [127].About 40 ERV-associated locus-specific transcripts from HRAS-transformed cells were also identified within TCGA RNAseq from breast, colon, or prostate tumors (e.g., including members of HERV-L, HERV-FRD-like PABL_A, and HERV-H) [127].These findings suggest the presence of locus-specific changes controlling ERV expression that may be recapitulated in certain cell types.Such changes may correlate with LTRs expressed upon activation of common signaling pathways, but ERV expression is not precisely coordinated within perturbed cellular states.

The Cancer ERV Transcriptome Is Limited but Complex
An understanding of the larger scope of the potential impact of ERV expression is aided by the deeper annotation and quantitation of expressed loci within additional tumor types or cellular models.One such approach recapitulates ERV transcripts by genomeguided de novo assembly of an 'LTR transcriptome' [166].The analysis of the TCGA LTR transcriptome of 31 cancer types reveals the inclusion of just 17.3% of genomic ERV loci (of 630,356 in GRCh38), of which 3.2% are present in tumor-specific transcripts [166].ERVs that populate recurrent cancer-specific transcripts (CSTs) represent broad ERV group members but account for less than 1% of annotated loci, implying that the involvement of most ERVs is limited by the cellular environment controlling their expression [166].For example, the HERV-K 22q11.23 lncRNA PCAT14 is highly expressed in prostate tumors but also in tumors of the testes and lungs, suggesting accessibility of the locus over multiple tumor types [166].Many transcripts are associated with ERVL-MaLRs (e.g., older MLT1s, primate-specific MSTs, and simian primate-specific THE1s [58]); young LTR7b and LTR7y HERV-H members, as well as human-specific and unfixed HERV-K HML-2, are also present.Importantly, these findings hint at the limitation in such studies that unannotated insertionally polymorphic LTR5H members may contribute to the data but not be mapped in genome-guided analyses [64].The variable presence of insertions within relatively new genomic contexts could have profoundly disruptive consequences.Although not a direct comparison, it should be noted that HML-2 proviral expression is biased to older members in normal tissues of GTEx RNAseq; among the ones expressed is the LTR5Hs-driven 22q11.23 PCAT14 [157].Thus, highly expressed cancer-specific ERVs represent a relatively small proportion of LTRs, indicating common shifts in the cellular environment between some involved loci.
The landscape of LTR-associated transcripts in cancers is highly complex but is beginning to be disentangled.Mapping of the TCGA cancer-specific transcripts reveals that standalone ERVs account for 17% of the transcripts and LTR-initiated chimeras with gene or lncRNA sequence for 9% [166].Particularly, LTRs of these latter chimeras provide prime candidates for novel 'onco-exaptation' events, in which the reactivation of a LTR drives the overexpression of a proto-oncogene or oncogenic isoform [6,102].A growing number of examples of LTRs involved in onco-exaptation have been reported [6,29,32,[102][103][104][105] and recently reviewed in [106].For example, a LTR7y/HERV-H cryptic promoter-driven SLCO1B3 oncogene transcript previously identified in colon, lung, and pancreatic cancers is highly abundant in TCGA of the stomach and esophagus [106,166].A recent study confirmed KLF5-mediated activation of a LTR7y/HERV-H drives a CALB1 isoform in lung squamous cell carcinoma [103].Interestingly, distinct LTRs may also influence the activation of the same gene, possibly due to different cellular contexts.For example, recent studies independently found a MER21B-E2F3 chimeric transcript among oncogenic transcripts in bladder cancer cell lines [105], whereas a HERV9 LTR12C-E2F3 transcript is among the top oncogenic transcripts in ovary, prostate, and urothelial cancers [102].In this latter study, the authors identified 129 TE onco-exaptation events involving 106 genes across 3864 tumors, with at least one event in around 50% of the tumors; onco-exaptation of ERVs was estimated to be one to two-fold higher than other TE classes [102].Additional ERV-oncogene transcripts include a MaLR MLT1J-SALL4 predominantly in breast carcinomas and MaLR THE1A-HMGA2 nearly exclusive to skin cutaneous melanomas [102].Numerous non-LTR retroelement onco-exaptation events have been reported [102,105].

ERVs Induce a State of 'Viral Mimicry'
The induction of IFN-stimulated genes (ISGs) is observed in many tumors and cell models and is due to the phenomenon of 'viral mimicry' [131].In viral mimicry, dsR-NAs sourced from retroelements (e.g., from bidirectional transcription of a single element, hybridization of transcripts of high sequence similarity, or hairpin structures of inverted repeats) are sensed by the cell, interpreted as a viral infection, and trigger antiviral IFN signaling, setting into action the innate immune response [167,168].dsRNAs formed via the transcription of inverted repeat SINE/Alu elements appear to be the major driver of viral mimicry activation [169], though LINE, as well as ERV dsRNA species, also trigger an antiviral state [131].Because the outcomes of this response can include PKR-mediated cell death and increased processing and presentation of TE-derived peptides as tumorassociated antigens, therapeutic agents that expose such immune vulnerabilities of tumors are of high interest, and recent studies have improved our understanding of ERV involvement [131].For example, induced hypomethylation by the DNMTi decitabine in clear cell renal carcinoma cell lines induces broadly activated ERV groups and antiviral signaling; RNAs of the highest expressed ERVs (e.g., ERV-Fc2-related) are sensor-bound, and the signaling is attenuated by the knockout of MDA5, RIGI, or downstream MAVS [170].In another study, treatment of pancreatic ductal adenocarcinoma cells with the MEK inhibitor trametinib induced ERV1 (e.g., MERs), ERV-K (including HML-2), and ERV-L (e.g., MLT1s), resulting in a robust MAVS-dependent IFN response [171].Remarkably, a subset of IFNγ-inducible LTRs (e.g., mostly ERVL-MaLR MLT1, MST members) situated antisense in the 3 ′ UTRs of several STAT1-inducible genes (e.g., TNFRSF9, TRIM22, and TRIM38) has even evolved to be uniquely primed for bidirectional transcription; they are normally silenced by EZH2, and its knockdown drives a feedforward IFNγ signaling strongly associated with MHC-1 presentation [172].Candidate ERV loci for contributing to dsRNAs via bidirectional transcription are identified in the TCGA LTR transcriptome; around 30% of highly expressed tumor-specific transcripts possess a terminal LTR, as well as conventional gene TSS [166].Chromatin regulators have been characterized in the context of ERV-associated viral mimicry [131,173,174].A regulator of SETDB1 maintenance, PHF8, has been identified as a mediator of tumor immune escape; its ablation stimulates antiviral mimicry in colorectal cancer cells, resulting in the inhibition of tumor growth and immune susceptibility [175].Consistent with these findings, overexpression of chromatin regulators as SETDB1 and members of the HUSH and TRIM28 complexes are implicated in tumor immune inhibition [131,176].Depletion of the KZFPs ZNF417 and ZNF587 (primatespecific repressive TFs of evolutionarily young HERV-K [177]) in cells derived from diffuse B-cell lymphoma results in heterochromatin remodeling and IFN signaling, thus enhancing immune susceptibilities [178].
It is important to recognize that cancer cells can likewise adapt to retroelement-driven viral mimicry to circumvent activation of the antiviral state.For example, ADAR1-mediated A-to-I editing of SINE/Alu-derived dsRNAs renders them unrecognizable to the dsRNA sensor MDA5; recent work has demonstrated that ADAR1-dependent cancer cells evade viral mimicry activation, and its depletion reduces tumor growth in patient-derived cancer cells [169].Systematically screening for viral mimicry adaptations has identified additional proteins involved in cancer dependencies [179].For example, the RNA decay protein XRN1, which degrades uncapped RNAs (e.g., such as those sourced from transcription of SINE/Alu), confers a dependency in a subset of cancer cell lines; its knockout is associated with reduced cell viability consistent with the induction of viral mimicry [179].Other cellular proteins in pathways involving RNA modification and nucleic acid metabolism pathways were implicated in the same study [179].Thus, targeted therapies capable of disrupting such cancer dependencies offer the potential to overcome viral mimicry adaptation, warranting further investigation.Augmenting the antiviral response via ERV activation should represent novel avenues of cancer therapeutics.In this regard, two of the 13 top genes reported alongside XRN1 as regulating viral mimicry adaptation are also present in the TCGA highly expressed tumor-specific LTR transcriptome predicted proteincoding transcripts: CFLAR (LTR5Hs-associated in testis) and ILK (MalR MLT1M-associated in several tumors) [166,179].
Because IFNs stimulate ISG immune responses involving the antigen presentation machinery, ERV sequences spliced or embedded within transcripts have the potential to produce completely novel antigenic peptides [121,133].A significant revelation has been that ERVs associated with transcripts in somatic tissues, including tumors, frequently originate from alternate promoters rather than the LTRs themselves [166].The contextual placement of the ERVs thus needs to be fully considered in RNAseq callsets, as their presence does not necessitate direct use as a promoter or enhancer.For example, chimeras with gene or lncRNA or transcripts with spliced or embedded ERV sequence account for roughly 40% of TCGA cancer-specific highly expressed transcripts [166].Similar observations have been made in the examination of healthy tissues of GTEx RNAseq for HERV-K HML-2, in which just nine of 37 expressed proviruses had clear 5 ′ LTR TSSs [157].In that study, ERV expression by the mechanism of readthrough was epitomized by transcription through a largely truncated LTR5B at 6p25.1 that lacked a 5 ′ end [157].The production of immunogenic ERVderived peptides in an antitumor adaptive response implies the potential for antitumor therapeutic relevance [121,133].Highly predictable ERV-overlapping transcripts should thus potentially aid in prognosis and understanding cancer-specific antigenicity [166].

ERVs Expressed in Cancers Include Ones Exapted in Development
Several placental genes have been previously identified to possess exapted LTR promoters [95], and a recent work has characterized genes with exapted LTRs that bear enhancer activities in tissues of the placenta [180].Interestingly, TCGA ERV-associated cancer-specific transcripts overlap genes with exapted LTRs that bear promoter activities in the trophoblast, including NOS3 (exapted LTR10A promoter), PTN (LTR2B/HERV-E), and HSD17B1 (MER21A) [95,166,181].These transcripts are present in multiple tumor types and include sequences of the gene and its corresponding LTR.Other genes with reported trophoblast LTR exaptation, for example, the X-linked MID1 (exapted HERV-E promoter), ENTPD1 (MER39B), and ACKR2 (MER39) [181,182], are present in TCGA but associated with alternate LTRs [166].The exapted ERVWE1 env, syncytin-1, is also highly expressed in some TCGA tumors [166].Many recent studies have implicated the relevance of lncRNAs in various cellular processes [81][82][83][84][85], including in tissues of the trophoblast and placenta; the biological activities of these lncRNAs were recently reviewed in [183].Notably, TCGA highly expressed tumor-specific transcripts also include ones that overlap with the reported lncRNAs of the trophoblast [166,183].These include the previously characterized primate-specific LTR7/HERV-H lncRNA UCA1 that has been recently implicated in the proliferation of human trophoblast stem cells [184], as well as the lncRNAs SH3PXD2A-AS1, RPAIN, PROX1, MEG3, and PVT1 [183].Deregulation of these lncRNAs is significantly associated with progression in a variety of cancers, as well as early-onset preeclampsia [183].Similarities have been drawn between developmental tissues such as embryo and trophoblast with cancer cells [185,186].Possibly, the combination of activated common signaling pathways, as well as a permissible chromatin state, is reflective of the exaptation of ERVs in early development that are susceptible to later reemergence in the cancer landscape [181].An alternative proposal is that the activation of early developmental LTRs may promote dedifferentiation through the onco-exaptation of genes that influence chromatin states reminiscent of early development, though causative links between the two are not yet clear [187].

ERV Expression in Embryogenesis Is Precisely Regulated
Recent studies have highlighted the regulation and roles of ERV activation in early cellular development.After fertilization, the genome is in a globally demethylated state [188], and chromatin remodeling is established gradually [189].The onset of transcription, i.e., zygotic or embryo genome activation (here, EGA), can be characterized by the cell number of the embryo (e.g., two-cell is '2C').EGA varies between mice and humans, widely reported at 2C and by 8C stages, respectively, and ERVs are expressed at each stage [190,191], though recent investigations have revealed earlier low-level transcription in both species, including ERVs [192,193].Regardless, a clear fact is that precisely regulated lineagespecific ERV expression and subsequent silencing coincides strongly and specifically in a stage-dependent manner in mice and humans, suggesting key roles in species-specific developmental programs [190,194,195].For example, in mice, MERV-L and ERVL-MalR members are activated in 2C and 4C embryos, whereas ERV-K members are later expressed in the 8C and morula [196].In humans, studies have shown that HERV-K14 and HERV9 transcripts are present in the oocyte and dramatically increased in the 2C and 8C stages, respectively; HERV-L, ERVL-MaLR, and HERV-H (LTR7b) are expressed in the 8C; HERV-K (LTR5Hs) in the morula; and HERV-H (LTR7y) in the blastocyst [194].Recent studies have additionally hinted at the activation of similar retroelement expression in the embryos of other placental mammals, such as cow, pig, and dog [197][198][199].Understood according to broad classification (e.g., ERV1, ERVL, and ERVL-MalR), these findings underpin paths of comparative research in these models.Collectively, these observations have led to the intriguing proposal that species-specific ERV activation may provide a 'molecular rheostat' for the regulation of pluripotency [200].Specific discussion of ongoing and recent findings for those belonging to the mouse ERV-L and human ERV-L, ERV-H, and ERV-K groups follow.

Mouse ERV-L
Members of the DUX (double homeobox; mouse Dux and human DUX4) TF gene family are among the facilitators of EGA [201][202][203].Promoters of expressed transcripts in mouse embryos are enriched for the Dux TFBS and include 2C gene promoters, as well as LTRs of MERV-L-related lineages (e.g., MuERV-L and ERVL-MaLR) [57,190,201,202,204]. Recent works have highlighted the complexities of Dux/MERV-L regulatory dynamics.Dux activates MERV-L members at the 2C stage, concomitant with EGA [190,201,204].MERV-L are silenced upon exiting the 2C stage by H3K9 methyltransferases G9a and GLP [3,205].Upon activation, MERV-L transcripts contribute to ~3% of polyA RNAs in totipotency and serve as a general marker of the 2C stage and a transient 2C-like state [190,195].The broad depletion of full-length MERV-L transcripts has been shown to cause lethality, with loss of lineage specification and genomic instability, and MERV-L-depleted embryos retain an accessible chromatin structure and aberrant expression of a subset of 2C genes [206].A recent study indicated that the rapid silencing of Dux by the exit of the 2C stage is mediated by LINE-1 RNAs in a complex with nucleolin-1 and TRIM28/Kap1 and is linked with rRNA synthesis [207], as well as a Dux-induced feedback loop of TRIM24-and TRIM33-mediated silencing via the Muridae-specific Duxbl [208].The silencing of Dux (and in turn, MERV-L) is also linked with a late-2C surge in cytoplasmic viscosity accompanied by nuclear remodeling and nucleoli maturation [209].Preventing this state leads to incomplete silencing of Dux/MERV-L and cleavage stage arrest [209].These findings suggest a requirement of the MERV-L presence and strict regulation in 2C embryos, with a putative role in regulating the switch from totipotency to pluripotency [206].
MERV-L transcripts include spliced 5 ′ LTR-first exon fusions with coding sequences of nonretroviral origin, indicating the exaptation of LTR promoter functions as a resource for the coordinated expression of genes [57,190].Interestingly, the LTRs linked to these transcripts appear biased by age, with young ERV groups (e.g., mus-specific; MT2s) predominantly represented [57].Among the expressed MERV-L sequences are a proportion of MERV-L MT2 that encode gag ORFs, including ones sourced from mus-specific insertions amplified within the last ~10 my [56,210].A subset retains gag ORFs, which have been previously shown to contribute to epsilon virus-like particles of an unusual morphology [210].In this regard, MERV-L-Gag proteins are also present in early embryos at the mid-2C to 4C stages [206], and virus-like particles are observed in the early embryo in the endoplasmic reticulum [190].The presence of Gag and OCT4 have been shown to be inversely correlated; in totipotent cells where Gag is high, OCT4 is low, and the opposite is observed in pluripotent cells, despite no changes in mRNA levels of the TF [190].Linking these observations, a recent study has implicated MERV-L-Gag as a modulator of the TFs OCT4 and Sox2 in early-stage (2C) embryos [211].The study identified a MERV-L Gag binding partner, the prefoldin complex protein URI, which otherwise binds and protects OCT4 and Sox2 from degradation [211].In this model, the increase in MERV-L Gag displaces URI from either of the two in the 2C stage, leading to OCT4 and Sox2 degradation [211].The subsequent decrease in Gag levels confers OCT4 and Sox2 actions and the shift to pluripotency [211].The findings implicate its potential exapted role as a modulator of cell lineage specification in mice in the transition from totipotency to pluripotency.Importantly, this represents the first reported functional interaction of an ERV protein in mouse embryonic development.In this regard, the Gag of a ~10 my old distantly related MERV-L is well characterized for its exapted use as the restriction factor Fv1 [212].

Human ERV-L
Recent studies have advanced our understanding of HERV-L in early development and drawn parallels and distinctions with MERV-L.During the transition to the 2C stage, HERV-L-related LTRs are broadly derepressed with accessible but inactive promoters [213].HERV-L and ERVL-MaLR members display a marked induction associated with accessible promoter and enhancer-like regions beginning in the 4C stage that is followed by rapid silencing [194,214].In contrast to activated MERV-L in mouse embryos, recent works have indicated that activated HERV-L includes relatively older ERV-Ls (e.g., MLT2A1 and MLT2A2) [191,214,215].Although there are MLT2 groups that predate the humanmouse split, these two HERV-L groups entered the germline of simian primate ancestors ~65-45 mya [215].Their activation in embryogenesis appears to be conserved among the examined extant species (e.g., human, macaque, and marmoset) [215].
Thousands of MLT2As become accessible during the transition from zygote to the 2C stage; their induction coincides with DUX4 gene activation in the 4C and 8C stages, and activated LTRs are indeed shown to be DUX4-bound [201,215,216].Mapping of the transcripts reveals TSSs are in the LTRs, further indicating precise regulation [215].Transcribed LTRs tend to be represented by 'long' MLT2A members >200 bp, with splice sites mostly to a sequence that is unannotated or within non-coding exons.Spliced transcripts from humans include ones with sequences from at least 21 protein-coding genes; a single spliced protein-coding transcript (i.e., SH3BGRL) is present in humans, macaques, and marmosets [215].In considering mouse Dux activation of 2C genes as well as MERV-L as discussed above, these findings support distinct evolutionary patterns within DUX, which, despite their divergence, have maintained EGA-associated gene promoter interactions, as well as ERV activation by species (e.g., subfamily specificity of HERV-L and MERV-L LTRs of humans or mice), and experienced shifts in the properties of activated ERVs (e.g., the tendency of older vs. younger, respectively).There are several additional HERV-L MLT2-related groups in humans, but none are activated in embryos in the manner of MLT2A1 and MLT2A2, suggested to be due to the lack of DUX4-binding motifs [215].
As discussed in Section 4, among all TFBSs, those for KZFPs are outliers among the most enriched intersecting ERVs; also identified within the top outliers for TFBS enrichment is DUX4 [93].Aside from ERV-L MLT2As, DUX4-binding motifs are also pervasive within ERVL-MaLRs (e.g., eutherian MLT1s and primate-specific THE1 and MST groups) and are present in relatively minor subsets of other LTR and TE types [60].For example, of 63,795 DUX4 motifs predicted in the human genome, nearly two-thirds overlap LTRs, and over one-third overlap ERVL-MaLRs [60].DUX4 activation in 4C embryos of ERVL-MaLR bidirectional enhancer-like regions significantly alters the chromatin accessibility and appears to contribute to regulatory accessible regions and transcripts of EGA genes [214].Though the repression of these ERV groups is not fully clear, ZNF-mediated H3K9me3 deposition appears to be stage-specific and act on different ERV groups, for example, ZNF766 and ZNF486 bind ERV-MaLR THE1 and MST members in the 8C stage, whereas the ERV-L examined in the study are H3K9me3-unmarked and likely silenced by other mechanisms [113].As discussed above, a majority of 'older' ERV-L members are reported as unresponsive to DNMTis, as well as H3K9me3 inhibitors [114].Thus, the mechanisms involving ERV-L regulation remain to be clarified and should benefit from further locusspecific characterization of this group.Due to the common presence and DUX-mediated activation of ERV-L-and ERVL-MaLR-related members in humans and mice, similar functions between the two implies their independent exaptation in both species [5].
Embryonic DUX4-driven HERV-L transcripts consist of a large proportion of MLT2A LTRs with splice donor sites fused with gene sequences [194,215].DUX4 is strictly silenced in differentiated tissues; its re-expression activates TSSs as alternative drivers of genes and lncRNAs in ERVL and MaLR gene chimeras [217] and is implicated in facioscapulohumeral muscular dystrophy [204].In cancers, DUX4 re-expression is reported to block IFNγ induction of class I MHC antigen presentation, implicating a property of immune evasion [218], and promotes a metastable early embryonic cell program [219].A recent examination of some somatic tissues implies that reactivated HERV-L may later serve as functional alternative promoters [215].For example, MLT2A1 appear to be capable of initiating DUX4-independent synthesis and providing the first exons of bona fide proteincoding transcripts (e.g., ABCE1, GALNT13, and COL5A1) when later reactivated in some examined somatic cell types of humans (but, importantly, not macaque), such as the pineal gland [215].Further, the canonical start codons of ABCE1 and GALNT13 are in exon 2 and thus not interrupted in these transcripts.Based on the TFBS profiles of brain tissues, the authors suggested the TF OTX2 as a candidate activator of the associated MLT2As [215].On this note, ERV-associated tumor-specific transcripts involving all MLT2 groups are accounted for within the tumor-specific TCGA LTR transcriptome [51,166].Highly expressed tumor-specific TCGA transcripts include alternate LTR chimeras with GALNT13 (associated LTR12D) in tumors from the brain and adrenal gland, as well as COL5A1 (MalR THE1B) in lymph nodes [166].

Human ERV-H
The activation of HERV-H is implicated in early embryo programming and serves as a marker thereof [220][221][222].HERV-H transcripts contribute to roughly 2% of polyA mRNAs in human embryonic stem cells (hESCs), and their activation promotes the maintenance of pluripotency [223].They are comprised of LTR-initiated chimeric transcripts, including ones with alternative exons, as well as lncRNAs of biological relevance to pluripotency.For example, the lncRNA linc-ROR is proposed as a sponge of regulatory miRNAs for OCT4, Sox2, and NANOG to prevent their degradation [221,223,224].Highly expressed HERV-H demarcate CTCF cell-specific chromatin shaping by establishing topologically associating domain (TAD) boundaries via DNA loop formation and pluripotent chromatin structure [223,225].Interestingly, CTCF TAD boundaries are lost upon HERV-H depletion, and the random introduction of HERV-H sequences on chromosomes recapitulates TAD boundary formation independent of CTCF [223,225].Though broad depletion of HERV-H results in the loss of pluripotency in hESCs, there have been mixed results [221,222,224], as has been noted [72], possibly due to sequence differences in constructs used between studies [72,223].A recent work correlated the silencing of HERV-H lncRNAs with a candidate modulator preventing dedifferentiation, ZBTB12, a conserved BTB-containing ZFP [200].ZBTB12 binding and association with SIN3A/HDAC is observed locally for ~70 HERV-H loci and correlates strongly with the silencing of HERV-H lncRNAs (e.g., linc-ROR and ESRG).The ectopic expression of mouse ZBTB12 recapitulates HERV-H silencing in hESCs; its knockout in mouse epiblast stem cells does not impact ERV expression [200].The authors suggested a scenario of an acquired silencing function during primate evolution, in which HERV-H members inserted near pre-existing ZBTB12 binding sites were positively selected for control of the exit from pluripotency [200].
The recent sequence-based refinement of HERV-H LTRs permits the curation of subgroup properties of preimplantation embryos [72].Transcripts originating from HERV-H subgroups are differentially enriched across the embryo stages and predominantly sourced from younger LTR7b, LTR7y, and recently defined LTR7up loci [72,194].Strong LTR7b activation peaks at the 8C stage during EGA and morula [72,194] and thus overlaps in stage presence with HERV-L MLT2A members [194]; the strong induction of LTR7y overlaps this pattern, and LTR7y transcripts are later significantly elevated in the blastocyst [72,194], and LTR7up1/2 are dramatically induced in the blastocyst [72].Other LTR7s are differentially expressed in stage-specific patterns to a lesser extent [72,194].Sequence-based analyses of all 5 ′ and solo-LTR copies reveals a dynamic history resulting in the gain, loss, and exchange of cis-regulatory elements among the subgroups [30,72].The youngest (e.g., LTR7y and LTR7up) appear to have experienced relatively rapid diversification and are among those most highly expressed in early developmental stages [30,72,194], implying the recent evolutionary innovation of precisely regulated sequences.For example, a LTR7up-specific modification is the acquisition of a predicted SOX2/3 TFBS shown in vitro to be necessary for transcription [72].Many LTR7up loci distinctly overlap with actively bound TFBSs, including ones in the early embryo stages, such as KLF4, NANOG, SOX2, OCT4, and others, in which their sequences are differentially enriched compared to non-transcribed copies and ones of related subgroups [72,221,222], consistent with the TFBS presence from ENCODE and Roadmap Epigenomics data discussed above in Section 3 [86].However, TF occupancy alone does not fully explain the patterns of transcribed vs. non-transcribed loci [72].Thus, the observed patterns of HERV-H activation are due, at least in part, to intrinsic LTR properties.Further disentanglement of the properties of activated loci should benefit from the refined characterization of this ERV group and permit targeted analyses by subgroup-specific features.
As mentioned, HERV-H is notable for its pronounced shift in abundance of proviral to solo-LTR copies relative to other HERV groups [28,32,72,226].This state could reflect HERV-H as a relatively benign component of the genome (e.g., loss of env reminiscent of mus-specific MERV-L [210]) but is also suggestive of selection on sequences beyond the LTR [31,32].In this regard, most HERV-H 5 ′ internal sequences are retained (including three partial gag ORFs [39]), and a subset of these proviral loci are positively correlated with transcription in preimplantation embryos, suggestive of selection [32].Though the mechanisms driving HERV-H preservation are not fully clear, these observations seem to suggest selection in favor of the proviral sequence, for some copies may result from their activities in embryogenesis [31].The ability to tightly control HERV-H repression while selecting for the internal sequence could be a factor.In this regard, KZFPs (e.g., ZNF534 and ZNF90), as well as KAP1 and H3K9me3 loading, are captured at HERV-H LTR7up1/2 in ChIPseq of hESCs, but neither is clearly enriched nor depleted compared to other HERV-H LTRs, indicating that the repressive actions of these KZFPs do not fully correlate with their regulation in ESCs, thus implicating the involvement of other factors [72].

Human ERV-K
Expressed HERV-K have been reported over the early embryo stages [194,213].HERV-K HML-1 was active in the ancestors of OWMs ~40-30 mya.Transcripts from HML-1 members (e.g., LTR14B) are present in minor but detectable levels in the oocyte and peak in the 2C stage, returning to minor levels in the blastocyst [194,213].Another activated HERV-K group is from human-specific HML-2 members (e.g., LTR5Hs), also with a minor presence over multiple stages that peaks in the morula and is considered to be a marker of pluripotency [194,195].For example, beyond the 8C into the morula, LTR5Hs are decorated with H3K27ac enhancer marks and strongly driven by pluripotency TFs before being rapidly silenced by KZFPs [116].LTR5H activation promotes open chromatin en-hancer states, and experimentally forced repression alters the regulation of genes within <100 kb [116].Over the past several years, there have been advances to the knowledge of this group.
Among the transcripts from both HERV-K groups are ones attributed to transcription into a flanking sequence with little evidence of splicing [194].Possibly, some of these contribute to the reported HERV-K Gag-associated particles of the blastocyst [195] (around 17 gag ORFs are accounted for over these proviruses [54]), but these observations have yet to be substantiated.Transcripts corresponding to rec, an alternatively spliced product of HERVK, have also been reported in the blastocyst stage [195].Interestingly, the Rec protein appears to associate with and facilitate the transport of nonretroviral mRNAs to the cytoplasm in those cells [195].Interestingly, overexpression of Rec enhances the IFITM1 mRNA levels, a phenotype that may reflect immunoprotection by an early antiviral response of the embryo [195].An OCT4-binding motif is present among LTR5Hs (but not older LTR5A nor B); LTRs of expressed LTR5Hs are indeed bound and transactivated by OCT4, and its knockdown depletes LTR5Hs transcripts in early-stage embryos [195].Thus, HERV-K subgroups appear to harbor sequence-specific functional differences in a regulatory capacity.Consistent with this notion, an analysis of publicly available ChIPseq data of naïve and primed human ESCs indicated OCT4 and H3K27ac enrichments at LTR5Hs in the former but not the latter, suggesting their activity is also specific to the cell type [227].Analysis of RNAseq data from the same respective samples revealed the expression of genes up to 120kb from LTR5Hs loci (but not LTR5A nor B), suggestive of long-range enhancer effects.The expression of members of this ERV group should be of keen interest, given its properties as the only known recently active HERV, promoter activities, and coding capacity [64].These studies should also benefit from assessment of the allelic presence of insertionally polymorphic members, given their functional potential and inferred capability to potentially generate new viruses through recombination [64,68].

The Evolution of DUX Incorporates Species-Specific ERV Activation
It is worth revisiting the case of the DUX TF homologs for the ability to interact with conventional gene promoters, as well as those of LTRs.Dux and DUX4 (mouse and human, respectively) are intronless retroposed homologs originally derived from processed mRNAs of an ancestral DUX gene, DUXC [228][229][230].Dux and DUX4 later expanded within macrosatellite arrays in both mice and humans; the intron-containing ancestor was subsequently lost from both species, but its homologs are retained in arrays in Laurasiatherian models (e.g., dog, swine, and bovine), as well as Xenartha (e.g., sloth) [228].Afrotheria (e.g., elephant, hyrax, and tenrec) possess intronless arrayed homologs from an independently retroposed DUX ancestor [228].These findings place a double homeobox ancestor in placental eutherians ~110 mya and highlight the complex DUX evolution within the species' lineages [228].A single homeobox DUX ancestor is present in amphibians, reptiles, and non-eutherian mammals [228].
As discussed, human DUX4 activates human EGA genes, as well as LTRs belonging to HERV-L [201,204].A functional analysis of human DUX4 expressed in mouse embryos revealed the activation of common 2C-like orthologous gene promoters but not MERV-L [204].Of note, both homologs also appear to activate some ERVL-MaLR in the same background, but these are reported to be mostly distinct subsets of elements (<4% in common including just one common alternate promoter) [204].Intriguingly, canine DUXC expressed in a cultured dog cell model has recently been shown to activate common mouse 2C gene homologs, as well as LTRs of broadly classified ERV groups (e.g., ERV1, ERVL, and ERVL-MaLR MLT1), though the subgroup specificity of the expression is not yet clear [199].As with human DUX4, canine DUXC expressed in mouse embryos results in the activation of 2C-like gene promoters but not MERV-L [204].Together, these observations suggest that DUX homologs have maintained conserved properties of the transcriptional regulation of gene promoters but have evolved distinct association with LTRs that may be attributed to the divergence of binding within DUX homologs and across species.For example, species-level comparisons of sequence targets and analyses of the protein structure reveals that, despite sharing high structural similarity between the two homeodomains, DUX homeodomain 1 and 2 exhibit different target DNA preferences [231].For the case of ERV-L subgroups of humans and mice, one function of Dux appears to be involved in the speciesspecific activation of exapted LTRs, with probable roles in genome activation and/or early cell fate specification.To our knowledge, DUXC transcriptional regulation of retroelements in dogs has not been further explored.The genome of the domestic dog has a relatively low representation of ERVs [18,232] but appears to have retained regulatory properties common to ones expressed in genome activation in humans and mice [42,198,199].Given the observations in human and mouse DUX-derived functions, and the identified properties of DUXC, it will be interesting to see how the evolutionary history and activation of ERVs by DUXC plays out.
Of relevant note, the retroposed origination and repeated expansion of Dux and DUX4 have been suggested to have been driven by pressure to avoid the activation of propagating retroviruses at the time while maintaining early-stage gene control, reminiscent of mutational escape in an 'arms race', as previously proposed to explain KZFP evolution [201].Such a scenario might account for the divergence of LTR recognition by DUX members but does not explain the species-specific retained functions exerted in regulating ERVs, as is evidenced from the examination of DUX homolog-mediated ERV activation between mice, humans, and dogs.We speculate the alternative scenario in which DUX expansion instead took advantage of the ability to activate propagated LTRs and domesticated their use in species-specific embryogenesis regulatory networks.

Concluding Remarks
The layers of evolved complexity regarding the once-reputed 'junk' of our genomes are both astonishing and humbling.As inferred from the ERV fossil record, the scale of virus-host co-evolution stretches a span reaching over 450 million years.The emergence of ERV-repressive KZFPs exemplifies an early established interplay between virus and host and speaks to the importance of wielding ERVs as a functional resource in the subsequent shaping and diversification of genomic landscapes.Alongside this co-evolution between virus and host, the propagation of ERV lineages and DUX homolog expansion is reminiscent of a similar scenario of the exploitation of ERVs for bona fide functions, rather than to escape ERV activation in a strict 'arms race'.The co-evolutionary outcomes are truly remarkable.Our genomes have commandeered ERVs for key roles in many biological processes and are controlled for individual functions (e.g., syncytins), the expression of broad group members (e.g., viral mimicry), and lineage-specific regulation (e.g., immune signaling and early cellular development).The mechanisms contributing to ERV transcriptional control are being disentangled, but layers of complexity undoubtedly remain.ERVs that are tightly regulated during early development can later unleash alternate promoters and enhancers of proto-oncogenes upon the loss of control.The aberrant expression of ERVs in epigenetically altered environments appears to involve a relatively limited number of ERVs compared to those genome-wide but reflect a high degree of heterogeneity in the expressed lineages and subgroups.Conversely, early development appears to control the expression of specific ERV lineages in a highly regulated manner in what seems to be a theme of placental mammals.Fuller annotations of ERV-associated transcripts should provide further insight into their involvement in these and other cellular environments.While the properties of ERVs continue to be more understood in these diverse biological contexts, it is important to keep in mind that many expressed ERV groups are still not well characterized.In this regard, understanding the properties of all involved ERV groups should significantly aid in their future study.Given the range and depth of orthogonal technologies now in use to interrogate the genome and its much accumulated but once coined 'junk', it is a truly exciting time for what lies in store.
Author Contributions: J.V.H. and A.S.J. wrote the paper.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Structures and features of major retroelement types.Representations of canonical LTR and non-LTR retroelements are depicted.(A) Structure of a full-length ERV.Transcription signals are labeled in the LTRs for transcription initiated by RNA Polymerase II and Poly(A) stop signal.LTRs: U3, dark grey; R, black; U5, light grey.The minimal viral genes of an autonomous ERV are shown: gag, pro/pol, and env.All proviruses possess short 4-6 bp target site duplications (TSDs), as shown by the short flanking arrows.Non-autonomous ERV derivatives exist, such as those lacking env or pol and env (also refer to the main text).(B) Non-LTR retroelements include the long and short interspersed elements (LINE and SINE).A full-length retrotransposition competent LINE encodes two protein-coding open reading frames, ORF1 and ORF2, which, when translated, provide the necessary functions for mobilization.LINEs are autonomous elements that drive the retrotransposition of their own transcribed RNA intermediate or that from transcribed non-autonomous retroelements, including SINE.Therefore, non-LTR retroelements bear the hallmarks of LINE-mediated mobilization.LINE elements are transcribed by RNA Polymerase II and SINE by RNA Polymerase III.Due to distinct mechanisms of ERV and LINE integration, the TSDs of LINE-mobilized retroelements are of an average longer length (~15 bp), as depicted by the arrows flanking each element type.

Figure 1 .
Figure 1.Structures and features of major retroelement types.Representations of canonical LTR and non-LTR retroelements are depicted.(A) Structure of a full-length ERV.Transcription signals are labeled in the LTRs for transcription initiated by RNA Polymerase II and Poly(A) stop signal.LTRs: U3, dark grey; R, black; U5, light grey.The minimal viral genes of an autonomous ERV are shown: gag, pro/pol, and env.All proviruses possess short 4-6 bp target site duplications (TSDs), as shown by the short flanking arrows.Non-autonomous ERV derivatives exist, such as those lacking env or pol and env (also refer to the main text).(B) Non-LTR retroelements include the long and short interspersed elements (LINE and SINE).A full-length retrotransposition competent LINE encodes two proteincoding open reading frames, ORF1 and ORF2, which, when translated, provide the necessary functions for mobilization.LINEs are autonomous elements that drive the retrotransposition of their own transcribed RNA intermediate or that from transcribed non-autonomous retroelements, including SINE.Therefore, non-LTR retroelements bear the hallmarks of LINE-mediated mobilization.LINE elements are transcribed by RNA Polymerase II and SINE by RNA Polymerase III.Due to distinct mechanisms of ERV and LINE integration, the TSDs of LINE-mobilized retroelements are of an average longer length (~15 bp), as depicted by the arrows flanking each element type.

Figure 2 .
Figure 2. Evolution and allelic presence of ERV retroelements.(A) Full-length ERVs reflecting prototypical ages are depicted.Upper: 'young' ERV copy with little changes present; identical LTRs; and retained gag, pol, and env ORFs; Middle: 'old' ERV with many accumulated mutations, various deletions, and loss of gene coding capacity; Lower: ERV possessing an env ORF despite many proximal accumulated mutations and loss of other ORFs, indicative of retained coding function of the gene.Vertical lines represent mutations; dashed lines represent deleted proviral sequences.(B) Recombinational deletion results in the formation of a solo-LTR with the loss of the internal viral coding sequence but retention of the modular LTR form and its intrinsic sequence properties.Matched TSDs are likewise present following canonical solo-LTR formation (flanking arrows).(B) Possible alleles present for an ERV-derived locus post-integration.Upper: full-length; Middle: solo-LTR resulting from 5′-3′ LTR recombination.Lower: Prior to fixation of the insertion, a third 'unoccupied' allele can be present.ERV loci for which variable alleles are present within individuals of a host population are referred to as 'insertionally polymorphic'.

Figure 3 .
Figure 3. Overview of ERV control and ERV-associated transcripts.(A) ERV LTRs possess intrinsic features for transcriptional activity that can promote their expression and use as promoters or enhancers, such as transcription factor binding sites, as well as transcriptional signals recognized by RNA Polymerase II (summarized in green).Silencing of ERVs is achieved via epigenetic repressive modifications, including histone modifications and DNA methylation.A potent mechanism of silencing is the binding of the ERV primer binding site (PBS; labeled in orange) used during reverse transcription.Repressive binding of the PBS is mediated by a member of the Krüppel-associated box zinc finger protein family (KZFP; labeled in red).KZFP subsequently scaffolds epigenetic silencing complexes to exert potent silencing and promote heterochromatin spreading (summarized in red).The modular nature of an ERV LTR is depicted showing the unique (U3 and U5) and repeat (R) segments.(B) Examples of ERV-associated transcripts observed in tissues (also refer to the main text).

Figure 3 .
Figure 3. Overview of ERV control and ERV-associated transcripts.(A) ERV LTRs possess intrinsic features for transcriptional activity that can promote their expression and use as promoters or enhancers, such as transcription factor binding sites, as well as transcriptional signals recognized by RNA Polymerase II (summarized in green).Silencing of ERVs is achieved via epigenetic repressive modifications, including histone modifications and DNA methylation.A potent mechanism of silencing is the binding of the ERV primer binding site (PBS; labeled in orange) used during reverse transcription.Repressive binding of the PBS is mediated by a member of the Krüppel-associated box zinc finger protein family (KZFP; labeled in red).KZFP subsequently scaffolds epigenetic silencing complexes to exert potent silencing and promote heterochromatin spreading (summarized in red).The modular nature of an ERV LTR is depicted showing the unique (U3 and U5) and repeat (R) segments.(B) Examples of ERV-associated transcripts observed in tissues (also refer to the main text).